Risk Patterns & Operational Implications

Author

jl3205

Code
suppressMessages(reticulate::py_install("scipy"))
suppressMessages(reticulate::py_install("plotly"))
suppressMessages(reticulate::py_install("scikit-learn"))

This analysis builds on the EDA by uncovering systematic risk disparities, emerging threat patterns, and potential predictive signals. Here, the focus shifts to understanding why these patterns emerge, how they cluster, and what they imply for decision-making in the field.

Victim Disparity: National vs. International Staff

Are national staff disproportionately affected by violence?

The following analysis compares harm profiles between national and international aid workers across three key outcomes: fatalities, injuries, and abductions.

  • The results confirm a key insight—national staff are often on the frontlines of humanitarian crises, bearing the brunt of operational risk.
Code
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import plotly.express as px

df = pd.read_csv("clean_data/security_incidents_cleaned.csv")


def plot_victim_impact(df, national_col, international_col, label):
    filtered_df = df[(df[national_col] > 0) | (df[international_col] > 0)]

    long_df = pd.DataFrame({
        label: pd.concat([
            filtered_df[national_col],
            filtered_df[international_col]
        ], ignore_index=True),
        "Role": ["Nationals"] * len(filtered_df) + ["Internationals"] * len(filtered_df)
    })

    t_stat, p_val = ttest_ind(
        filtered_df[national_col],
        filtered_df[international_col],
        equal_var=False
    )

    plt.figure(figsize=(8, 5))
    sns.boxplot(
        data=long_df,
        x="Role", y=label,
        hue="Role", dodge=False,
        palette={"Nationals": "#4EA8DE", "Internationals": "#F4A261"}
    )
    plt.title(label, fontsize=14, weight="bold")
    plt.ylabel("Amount")
    plt.xlabel("")
    plt.tick_params(labelsize=11)
    plt.grid(axis="y", linestyle="--", alpha=0.4)

    legend = plt.gca().get_legend()
    if legend:
        legend.remove()

    annotation = f"P < 0.0001" if p_val < 0.0001 else f"T = {t_stat:.2f}\nP = {p_val:.4f}"
    plt.text(
        0.5, plt.ylim()[1]*0.85,
        annotation,
        ha='center', va='top',
        fontsize=14, fontweight='bold', color='white',
        bbox=dict(facecolor='black', edgecolor='white',
                  boxstyle='round,pad=0.6')
    )

    plt.tight_layout()
    plt.show()


plot_victim_impact(df, "Nationals killed",
                   "Internationals killed", "Fatalities")

Figure 8: Fatalities are significantly higher among nationals - stressing the disproportionate frontline exposure in high-threat environments

Code
plot_victim_impact(df, "Nationals wounded",
                   "Internationals wounded", "Injuries")

Figure 9: Injury counts further highlight the disproportionate risk faced by nationals in high-risk deployments.

Code
plot_victim_impact(df, "Nationals kidnapped",
                   "Internationals kidnapped", "Abductions")

Figure 10: Abduction risk also skews heavily towards nationals, reflecting mobility and protectionn disparities.

Operational Asymmetry

Across all three types of harm, nationals consistently face significantly higher rates of violence. This structural risk disparity underscores the need to reassess field deployment strategies, training programs, and protective measures specifically tailored to national teams.

Modus Operandi by Threat Actor

Do certain actor types favor particular forms of violence?

Understanding the relationship between perpetrator type and their chosen tactics provides valuable insight into operational risk profiling.

  • This matrix helps surface behavioral patterns, enabling field teams to anticipate the nature of threats they are likely to face.
Code
contingency = pd.crosstab(df["Actor type"], df["Means of attack"])
chi2, p, dof, expected = chi2_contingency(contingency)

# print(f"Chi-square statistic: {chi2:.2f}, p-value: {p:.4f}")

plt.figure(figsize=(10, 6))
sns.heatmap(contingency, cmap="Reds", annot=True, fmt="d")
plt.title("Actor Type vs. Means of Attack")
plt.ylabel("Actor Type")
plt.xlabel("Means of Attack")
plt.tight_layout()
plt.show()

Figure 11: This matrix reveals clear associations between threat actor types and their preferred methods of attack.

Archetypes of Threat

Can we group incidents into clusters that reveal patterns in threat profiles?

Using PCA and KMeans clustering, I reduced the ‘incident’ feature space to reveal emergent threat archetypes. These clusters can help field actors recognize common attack profiles and identify anomalies.

  • Notably, several outlier incidents lie far beyond the main cluster, suggesting the presence of atypical threat scenarios—such as isolated, high-casualty events.
Code
features = df[["UN", "INGO", "ICRC", "NNGO", "Other", "Nationals killed",
               "Internationals killed", "Means of attack", "Actor type"]].copy()

features = pd.get_dummies(
    features, columns=["Means of attack", "Actor type"], drop_first=True)

X = StandardScaler().fit_transform(features.fillna(0))

kmeans = KMeans(n_clusters=4, random_state=42)
df["Cluster"] = kmeans.fit_predict(X)

pca = PCA(n_components=2)
components = pca.fit_transform(X)
df["PC1"] = components[:, 0]
df["PC2"] = components[:, 1]

fig = px.scatter(df, x="PC1", y="PC2", color="Cluster",
                 title="Incident Clusters (Threat Archetypes)",
                 hover_data=["Means of attack", "Actor type"])
fig.show()

Figure 12: PCA combined with KMeans clustering reveals distinct archetypes of threat incidents. Most events fall into dense clusters, indicating recurring operational patterns, while a handful of outlier points highlight rare or extreme scenarios—such as isolated, high-casualty attacks. These insights lay the groundwork for risk profiling and the development of early warning systems in humanitarian security planning.

Predict to Protect

Can we predict whether an attack will be fatal based on key attributes?

Logistic regression is defined as a statistical modeling technique used to predict binary outcomes — such as whether an incident results in a fatality — based on a set of input features.

As proof of concept, I trained a logistic regression model using organizational presence, perpetrator type, and method of attack to predict whether an incident results in fatalities.

  • This highlights the potential for automated threat assessment systems within humanitarian operations.
Why Logistic Regression?

Logistic regression was selected for its simplicity and interpretability — key in high-stakes decision environments. The model incorporates class balancing to address skewed fatality rates and ensure sensitivity to rare, high-risk outcomes.

Code
np.random.seed(5300)

df["Fatal"] = (df["Nationals killed"] + df["Internationals killed"]) > 0
features = df[["UN", "INGO", "ICRC", "NNGO", "Means of attack", "Actor type"]]
features = pd.get_dummies(features, drop_first=True).fillna(0)

X = features
y = df["Fatal"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

model = LogisticRegression(
    max_iter=1000,
    class_weight='balanced',
    solver='liblinear',
    random_state=42
)
model.fit(X_train, y_train)
LogisticRegression(class_weight='balanced', max_iter=1000, random_state=42,
                   solver='liblinear')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
y_pred = model.predict(X_test)

report_dict = classification_report(y_test, y_pred, output_dict=True)
report_df = pd.DataFrame(report_dict).T.loc[["False", "True"], [
    "precision", "recall", "f1-score"]]
report_df.index = ["No Fatality", "Fatality"]

report_df.plot(kind='bar', figsize=(9, 5), color=[
               "#4EA8DE", "#F4A261", "#E76F51"])
plt.title("Logistic Regression Performance by Class",
          fontsize=14, weight="bold")
plt.ylabel("Score")
_ = plt.ylim(0, 1.05)
_ = plt.xticks(rotation=0)
plt.grid(axis='y', linestyle='--', alpha=0.4)
_ = plt.legend(title="Metric", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.show()

Figure 13: This proof-of-concept model demonstrates the viability of forecasting fatal attacks using features such as organizational footprint and threat type.

  • Field teams could leverage models like this to prioritize risk assessments for high-profile deployments or unusual attacker–tactic pairings.